Discussion on Response Time
Learn to estimate serial and parallel response time, as well as understand the optimization techniques.
Calculate response time using parallel processing#
Let’s use the equation for response time with which we are already acquainted:
In parallel processing, the API gateway communicates with all the subservices simultaneously, as shown in the following illustration:
We’ll use the processing time to calculate the response time of an API. For that, recall the latency numbers we estimated in the latency lesson to measure the response time of GET and POST requests. Let’s populate the numbers in equation (1):
Response time for a
GETrequestResponse time for a
POSTrequest
The response time is significantly reduced on subsequent requests when the base time is omitted after using a cached response:
Response time for a
GETrequestResponse time for a
POSTrequest
Note: The 4 ms processing time indicates the parallel processing time of the services (as shown in the illustration) and in ideal cases, it remains the same.
Calculate response time using serial processing#
Let's suppose the API gateway communicates serially with all the subservices (one after the other); in that case, the processing time will be the sum of all the times taken by subservices.
According to the illustration above, each service provider’s processing time will be 4 ms. The total processing time for all service providers is given below:
  Processing time
The response time is calculated by putting the values of latency and processing time in equation (1):
Response time for a
GETrequestResponse time for a
POSTrequest
In the case of cached information, the response time is given as follows:
Response time for a
GETrequestResponse time for a
POSTrequest
These response time numbers are purely network-dependent and can vary greatly. Let’s discuss the optimization techniques in the subsequent section.
Discussion#
The latency, processing, and response times we estimated in this chapter are obtained through the API testing tool (Postman) or gauged using standard latencies of computational and networking devices. Through some practical experiments and theoretical formulations, we have estimated ranges of response times observed routinely in different phases of the lifecycle of an API call.
However, these numbers are not definite, and depend on various parameters. We merely performed back-of-the-envelope calculations to improve our understanding and pave the way for estimating a latency and processing budget for the design problems we aim to solve. In some cases, the system’s complexity will lead to a higher response time. In such cases, we can perform several optimization techniques, as detailed below.
Response time optimization#
An API response with an average of below 200 milliseconds is defined as an instant response, and the service is categorized broadly as a real-time service. In general, a response range varies between 0.1 and 1 second. If the response isn't in this range, the customer’s satisfaction is at risk, and the API needs optimizations.
The following factors of an API play a vital role in determining its response time:
Optimized network: In many cases, a high response time is due to a high latency time because of the network. It isn’t easy to manage this factor, as the services employ shared mediums for communication over the Internet. A service might need to employ edge data centers near their customers to reduce network latency. An example of such a mechanism is Netflix's streaming servers, which reside inside many ISPs to provide high-bandwidth and low-latency service to customers. Another example is the hyper scalers, such as Google, that have deployed its private wide-area network for low latency over long distances. Traffic usually moves from Google's network to the public Internet near the customer (probably at an IXP exchange or ISP). Additionally, as a service, we’ll need to see if bandwidth is the issue, or if it’s the latency (and every once in a while, both are the issue).
Optimized database: The query execution time can significantly affect the response time. The slow queries are the main reason for a reduced response time. An optimized query and schema in relational databases can help reduce the query execution time. Apart from queries, data storage technologies also play a vital role in storing and retrieving data.
Prefetch data: In some cases, we can prefetch frequently used data from the database, anticipating its request from users. It is best to prefetch such data that may be available publicly to the users by services to optimize response time.
Compress media files: Generally, the response of APIs takes more time to reach the client if large files are being sent over the network. If the machines on both ends are powerful enough, it is best to compress and/or encode more extensive data before sending it over the network.
CDN assistance: If some of the data is frequently fetched by many users, it is best to serve it from a nearby CDN. Usually, the data related to media, such as images or videos, is fetched using CDN.
Use of API monitoring tools: In certain cases, the cause of the delayed response is not apparent. API monitoring tools can help to identify the root cause of laggy APIs.
Effective bot management: The world is full of bots sending countless requests to service providers. Implementing effective bot management techniques will help us avoid entertaining useless requests.
Appropriate hosting service: Hosting a service can affect the response time in the case of small services with shared hosting.
Data centers with optimized resources: The data centers should have optimized resources depending on the nature of the enterprise solutions. Moreover, targeting the users, the data centers should be at the nearest locations.
Note: The
GETrequest can be faster than aPOSTrequest because of the following reasons:
The
GETrequests can be cached,POSTcan't be cached.For
GET, values are sent in a header, whereas, forPOSTdata is sent in the body of the request.Only ASCII characters are allowed in
GET, whereasPOSTcan use binary data types as well.The
POSTrequest uses multipart/form-data encoding for binary data (file upload).
Summary#
We’ve estimated the latency and processing time of both GET and POST requests. From these estimations, we have concluded that we’ll use the following reference numbers of base time, RTT, and download time to estimate latency in our design problems later in this course:
Reference Numbers
Request Type | Timebase | RTT | Timedownload | Timeprocessing |
| Minimum time = 120.5 ms Maximum time = 201.5 ms | 70 ms | 0.4 * size of response | Minimum time = 4 ms Maximum time = variable |
| 260 ms + 1.15 * size of request | 1.7 ms |
Note: The processing time can be 12 ms, 23 ms, or69 ms, (it depends on the processing, the number of services, the distance between them, and what type of operation they perform) and can vary accordingly.
The above numbers can vary during experiments due to different factors (mainly due to the network), so we’ll provide the opportunity to change these numbers in calculators (by making them input fields) to find the response time of an API in future lessons.
Quiz#
Let’s suppose a user interacts on a social media platform where posts with media files such as images and videos are displayed. The HTTP requests are sent to fetch the content to display to a user. Considering this scenario, answer the following quiz questions:
Quiz
(Select all that apply.) What strategie(s) can reduce the processing time when fetching large amounts of data from an API?
Optimizing database
Yes, we can optimize the database to fasten the query execution.
Optimizing bandwidth
Bandwidth optimization does not lie in processing time’s scope
Placing servers within a zone or region.
Placing servers at the nearest locations can greatly reduce processing time, as we discovered while estimating processing time.
The Estimation of Response Time of an API
The REDCAMEL Approach for Designing APIs